474 research outputs found

    Comparative genomics of regulation of heavy metal resistance in Eubacteria

    Get PDF
    BACKGROUND: Heavy metal resistance (HMR) in Eubacteria is regulated by a variety of systems including transcription factors from the MerR family (COG0789). The HMR systems are characterized by the complex signal structure (strong palindrome within a 19 or 20 bp promoter spacer), and usually consist of transporter and regulator genes. Some HMR regulons also include detoxification systems. The number of sequenced bacterial genomes is constantly increasing and even though HMR resistance regulons of the COG0789 type usually consist of few genes per genome, the computational analysis may contribute to the understanding of the cellular systems of metal detoxification. RESULTS: We studied the mercury (MerR), copper (CueR and HmrR), cadmium (CadR), lead (PbrR), and zinc (ZntR) resistance systems and demonstrated that combining protein sequence analysis and analysis of DNA regulatory signals it was possible to distinguish metal-dependent members of COG0789, assign specificity towards particular metals to uncharacterized loci, and find new genes involved in the metal resistance, in particular, multicopper oxidase and copper chaperones, candidate cytochromes from the copper regulon, new cadmium transporters and, possibly, glutathione-S-transferases. CONCLUSION: Our data indicate that the specificity of the COG0789 systems can be determined combining phylogenetic analysis and identification of DNA regulatory sites. Taking into account signal structure, we can adequately identify genes that are activated using the DNA bending-unbending mechanism. In the case of regulon members that do not reside in single loci, analysis of potential regulatory sites could be crucial for the correct annotation and prediction of the specificity

    Prospective surveillance of multivariate spatial disease data

    Get PDF
    Surveillance systems are often focused on more than one disease within a predefined area. On those occasions when outbreaks of disease are likely to be correlated, the use of multivariate surveillance techniques integrating information from multiple diseases allows us to improve the sensitivity and timeliness of outbreak detection. In this article, we present an extension of the surveillance conditional predictive ordinate to monitor multivariate spatial disease data. The proposed surveillance technique, which is defined for each small area and time period as the conditional predictive distribution of those counts of disease higher than expected given the data observed up to the previous time period, alerts us to both small areas of increased disease incidence and the diseases causing the alarm within each area. We investigate its performance within the framework of Bayesian hierarchical Poisson models using a simulation study. An application to diseases of the respiratory system in South Carolina is finally presented

    MCMC based Generative Adversarial Networks for Handwritten Numeral Augmentation

    Get PDF
    This is the author accepted manuscript. The final version is available from Springer via the DOI in this record.In this paper, we propose a novel data augmentation framework for handwritten numerals by incorporating the probabilistic learning and the generative adversarial learning. First, we simply transform numeral images from spatial space into vector space. The Gaussian based Markov probabilistic model is then developed for simulating synthetic numeral vectors given limited handwritten samples. Next, the simulated data are used to pre-train the generative adversarial networks (GANs), which initializes their parameters to fit the general distribution of numeral features. Finally, we adopt the real handwritten numerals to fine-tune the GANs, which increases the authenticity of generated numeral samples. In this case, the outputs of the GANs can be employed to augment original numeral datasets for training the follow-up inference models. Considering that all simulation and augmentation are operated in 1-D vector space, the proposed augmentation framework is more computationally efficient than those based on 2-D images. Extensive experimental results demonstrate that our proposed augmentation framework achieves improved recognition accuracy.This work was supported by grants from the Chinese Scholarship Council (CSC) program

    Estimating the Under-Five Mortality Rate Using a Bayesian Hierarchical Time Series Model

    Get PDF
    Background: Millennium Development Goal 4 calls for a reduction in the under-five mortality rate by two-thirds between 1990 and 2015, which corresponds to an annual rate of decline of 4.4%. The United Nations Inter-Agency Group for Child Mortality Estimation estimates under-five mortality in every country to measure progress. For the majority of countries, the estimates within a country are based on the assumption of a piece-wise constant rate of decline. Methods and Findings: This paper proposes an alternative method to estimate under-five mortality, such that the underlying rate of change is allowed to vary smoothly over time using a time series model. Information about the average rate of decline and changes therein is exchanged between countries using a Bayesian hierarchical model. Cross-validation exercises suggest that the proposed model provides credible bounds for the under-five mortality rate that are reasonably well calibrated during the observation period. The alternative estimates suggest smoother trends in under-five mortality and give new insights into changes in the rate of decline within countries. Conclusions: The proposed model offers an alternative modeling approach for obtaining estimates of under-five mortality which removes the restriction of a piece-wise linear rate of decline and introduces hierarchy to exchange information between countries. The newly proposed estimates of the rate of decline in under-5 mortality and the uncertaint

    Use of linear mixed models for genetic evaluation of gestation length and birth weight allowing for heavy-tailed residual effects

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The distribution of residual effects in linear mixed models in animal breeding applications is typically assumed normal, which makes inferences vulnerable to outlier observations. In order to mute the impact of outliers, one option is to fit models with residuals having a heavy-tailed distribution. Here, a Student's-<it>t </it>model was considered for the distribution of the residuals with the degrees of freedom treated as unknown. Bayesian inference was used to investigate a bivariate Student's-<it>t </it>(BS<it>t</it>) model using Markov chain Monte Carlo methods in a simulation study and analysing field data for gestation length and birth weight permitted to study the practical implications of fitting heavy-tailed distributions for residuals in linear mixed models.</p> <p>Methods</p> <p>In the simulation study, bivariate residuals were generated using Student's-<it>t </it>distribution with 4 or 12 degrees of freedom, or a normal distribution. Sire models with bivariate Student's-<it>t </it>or normal residuals were fitted to each simulated dataset using a hierarchical Bayesian approach. For the field data, consisting of gestation length and birth weight records on 7,883 Italian Piemontese cattle, a sire-maternal grandsire model including fixed effects of sex-age of dam and uncorrelated random herd-year-season effects were fitted using a hierarchical Bayesian approach. Residuals were defined to follow bivariate normal or Student's-<it>t </it>distributions with unknown degrees of freedom.</p> <p>Results</p> <p>Posterior mean estimates of degrees of freedom parameters seemed to be accurate and unbiased in the simulation study. Estimates of sire and herd variances were similar, if not identical, across fitted models. In the field data, there was strong support based on predictive log-likelihood values for the Student's-<it>t </it>error model. Most of the posterior density for degrees of freedom was below 4. Posterior means of direct and maternal heritabilities for birth weight were smaller in the Student's-<it>t </it>model than those in the normal model. Re-rankings of sires were observed between heavy-tailed and normal models.</p> <p>Conclusions</p> <p>Reliable estimates of degrees of freedom were obtained in all simulated heavy-tailed and normal datasets. The predictive log-likelihood was able to distinguish the correct model among the models fitted to heavy-tailed datasets. There was no disadvantage of fitting a heavy-tailed model when the true model was normal. Predictive log-likelihood values indicated that heavy-tailed models with low degrees of freedom values fitted gestation length and birth weight data better than a model with normally distributed residuals.</p> <p>Heavy-tailed and normal models resulted in different estimates of direct and maternal heritabilities, and different sire rankings. Heavy-tailed models may be more appropriate for reliable estimation of genetic parameters from field data.</p

    A semiparametric modeling framework for potential biomarker discovery and the development of metabonomic profiles

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The discovery of biomarkers is an important step towards the development of criteria for early diagnosis of disease status. Recently electrospray ionization (ESI) and matrix assisted laser desorption (MALDI) time-of-flight (TOF) mass spectrometry have been used to identify biomarkers both in proteomics and metabonomics studies. Data sets generated from such studies are generally very large in size and thus require the use of sophisticated statistical techniques to glean useful information. Most recent attempts to process these types of data model each compound's intensity either discretely by positional (mass to charge ratio) clustering or through each compounds' own intensity distribution. Traditionally data processing steps such as noise removal, background elimination and m/z alignment, are generally carried out separately resulting in unsatisfactory propagation of signals in the final model.</p> <p>Results</p> <p>In the present study a novel semi-parametric approach has been developed to distinguish urinary metabolic profiles in a group of traumatic patients from those of a control group consisting of normal individuals. Data sets obtained from the replicates of a single subject were used to develop a functional profile through Dirichlet mixture of beta distribution. This functional profile is flexible enough to accommodate variability of the instrument and the inherent variability of each individual, thus simultaneously addressing different sources of systematic error. To address instrument variability, all data sets were analyzed in replicate, an important issue ignored by most studies in the past. Different model comparisons were performed to select the best model for each subject. The m/z values in the window of the irregular pattern are then further recommended for possible biomarker discovery.</p> <p>Conclusion</p> <p>To the best of our knowledge this is the very first attempt to model the physical process behind the time-of flight mass spectrometry. Most of the state of the art techniques does not take these physical principles in consideration while modeling such data. The proposed modeling process will apply as long as the basic physical principle presented in this paper is valid. Notably we have confined our present work mostly within the modeling aspect. Nevertheless clinical validation of our recommended list of potential biomarkers will be required. Hence, we have termed our modeling approach as a "framework" for further work.</p

    Fast Approximate Geodesics for Deep Generative Models

    Full text link
    The length of the geodesic between two data points along a Riemannian manifold, induced by a deep generative model, yields a principled measure of similarity. Current approaches are limited to low-dimensional latent spaces, due to the computational complexity of solving a non-convex optimisation problem. We propose finding shortest paths in a finite graph of samples from the aggregate approximate posterior, that can be solved exactly, at greatly reduced runtime, and without a notable loss in quality. Our approach, therefore, is hence applicable to high-dimensional problems, e.g., in the visual domain. We validate our approach empirically on a series of experiments using variational autoencoders applied to image data, including the Chair, FashionMNIST, and human movement data sets.Comment: 28th International Conference on Artificial Neural Networks, 201
    corecore